library(genovisr)
#> Warning: replacing previous import 'ggplot2::last_plot' by 'plotly::last_plot'
#> when loading 'genovisr'
#> Warning: replacing previous import 'shiny::runExample' by 'shinyjs::runExample'
#> when loading 'genovisr'
library(GBScleanR)
#> Loading required package: SeqArray
#> Loading required package: gdsfmt
library(plotly)
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout
library(Biostrings)
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:plotly':
#> 
#>     rename
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:plotly':
#> 
#>     slice
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit

Genovis class object

Create dummy data to walk through the package with.

object <- simGenovis()

Marker information

The marker_info slot of the genovis class object stores the position and allele information of the genotyping markers. The id column shows marker IDs that have been created by concatenating the chromosome IDs and physical positions of markers.

# Marker information
head(object$marker_info)
#>         id chr    pos ref_allele alt_allele
#> 1   1_9518   1   9518          G          A
#> 2  1_50407   1  50407          G          A
#> 3 1_191311   1 191311          G          A
#> 4 1_199996   1 199996          G          A
#> 5 1_205100   1 205100          G          A
#> 6 1_247607   1 247607          G          A

The sample_info slot stores sample IDs. If your data was created by merging two genovis class objects with the pop argument, this slot also contains the pop column to indicate which populations the samples belong to.

# Sample information
head(object$sample_info)
#>         id
#> 1 sample_1
#> 2 sample_2
#> 3 sample_3
#> 4 sample_4
#> 5 sample_5
#> 6 sample_6

Genotype calls created by variant callers, e.g. samtools and GATK, are stored in the genotype slot as an N x M matrix, where N and M are the number of samples and markers, respectively.

# Genotype data
object$genotype[1:5, 1:5]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    1    1    0
#> [2,]    0    1    1    1    0
#> [3,]    0    0    2    2    0
#> [4,]    0    1    1    1    0
#> [5,]    0    1    1    1    0

Estimated haplotype data created by GBScleanR are stored in the haplotype slot as a P x N x M array, where P, N, and M are the number of ploidy, samples, and markers.

# Haplotype data
object$haplotype[, 1:5, 1:5]
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    2    2
#> [2,]    2    2    2    1    1
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    2    2
#> [2,]    2    2    2    1    1
#> 
#> , , 3
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    2    2
#> [2,]    2    2    2    1    1
#> 
#> , , 4
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    2    2
#> [2,]    2    2    2    1    1
#> 
#> , , 5
#> 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    2    2
#> [2,]    2    2    2    1    1

The dosage slot stores dosage data calculated from the haplotype data. The dosage data will be provided only if the given population shows segregation of two haplotypes, e.g. a biparental population derived from two inbred founders. The integer of the [i,j] element of the dosage data matrix indicates the number of the Hap2 allele at the jth marker of the ith sample.

# Dosage data
object$dosage[1:5, 1:5]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    1    1    1
#> [2,]    1    1    1    1    1
#> [3,]    2    2    2    2    2
#> [4,]    1    1    1    1    1
#> [5,]    1    1    1    1    1

Data import

From GBScleanR output

The genovis class object can be constructed directly from a GDS file created by the GBScleanR package. The GBScleanR package provides functions to convert Variant Call Format (VCF) files to Genotype Data Structure (GDS) files for quick access to large genotype data and estimate haplotypes and dosages of genotype markers based on allelic read counts using a hidden Markov model. The gbscleanr2genovis() function extracts marker, sample, and genotype information from a given GDS file. If the GDS file contains haplotype and dosage information, these data are also included in the output genovis class object.

object <- gbscleanr2genovis(gds_fn = "/path/to/your/gds_file.gds")

Data visualization

The genotype, haplotype, and dosage information in a genovis class object can be visualized using a variety of data visualization functions provided by the genovisr package.

Classical graphical genotype

Classical graphical genotype is a representation of genotypes by colored tiles to indicate which genomic region has which genomic segment descended from which parent. Therefore, the word “genotype” means descended “haplotype” in this context nowadays. The evalSegments() function summarizes breakpoints of haplotypes and dosages in the given genomes.

object <- evalSegments(object = object)

The plotGraphGeno() function visualizes haplotypes and dosages in a graphical representation based on the obtained breakpoint information.

plotGraphGeno(object = object, direction = "h", data = "haplotype", sample = 1:5, width = 0.1)

plotGraphGeno(object = object, direction = "h", data = "dosage", sample = 1:10, width = 0.1)

plotGraphGeno(object = object, direction = "v", data = "haplotype", sample = 1, width = 0.15)

plotGraphGeno(object = object, direction = "v", data = "dosage", sample = 1, width = 0.1)

The plotly package can be used to make these plots interactive.

p <- plotGraphGeno(object = object, direction = "v", data = "haplotype", sample = 1, width = 0.15)
ggplotly(p)
p <- plotGraphGeno(object = object, direction = "v", data = "dosage", sample = 1, width = 0.1)
ggplotly(p)

Statistical summary

We can also get the statistical summary of haplotype and dosage information using the statsGeno() function. The function requires a numeric vector indicating the lengths of chromosomes.

# If you have the reference genome FASTA, retrieve the chromosome lengths from the FASTA
# ref_genome <- readDNAStringSet(filepath = "genome.fa")
# ref_genome <- sort(ref_genome)
# chr_len <- width(ref_genome)
chr_len <- rep(42, 12) * 1e6
object <- statsGeno(object = object, chr_len = chr_len)

Visualize the obtained statistical summary using plotStats().

plotStats(object = object, data = "haplotype", group = "hap", value = "class")

plotStats(object = object, data = "dosage", group = "sample", value = "segment")

Histogram of segment lengths

The plotHist() function can be used to create histograms of segment lengths for haplotype or dosage data.

plotHist(object = object, data = "haplotype", binwidth = 1e6, fill = "darkgreen")

plotHist(object = object, data = "dosage", binwidth = 1e6, fill = "blue", chrwise = TRUE)

plotHist(object = object, data = "haplotype", binwidth = 1e6, fill = "red", samplewise = TRUE, ncol = 3)

Interactive Shiny Application

GenovisR also provides an interactive Shiny application for visualizing and analyzing genotypic data.

Launch the Shiny app

Use the shinyGenovir() function to launch the Shiny application.

shinyGenovir()

This application allows users to load GDS files, set parent samples, and visualize data interactively through various plots and controls. The Shiny app integrates many of the functions described above into a user-friendly interface.

Example: Loading a GDS file

Once the Shiny application is running, you can load a GDS file by selecting it from your file system. After the file is loaded, you can set the parent samples and visualize the genotype, haplotype, and dosage information interactively.

# Example of loading a GDS file within the Shiny app

Example: Visualizing data

The Shiny app provides controls for tweaking histogram parameters, graphical genotype parameters, and statistical parameters, allowing for dynamic and interactive data exploration.

# Example of visualizing data within the Shiny app

By following this vignette, you should be able to effectively use the GenovisR package to visualize and analyze genotypic data, whether through R scripts or an interactive Shiny application.